-
Notifications
You must be signed in to change notification settings - Fork 53
Fix race condition in WaitForInstanceAsync causing intermittent test failures #574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…End test to fail The original code had a TOCTOU race condition where a completion notification could be missed if the orchestration completed between checking completion status and adding the waiter. The fix reorders the operations to add the waiter first, then check for completion. Co-authored-by: YunchuWang <[email protected]>
|
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes an intermittent race condition in WaitForInstanceAsync that was causing the RestartAsync_EndToEnd test to fail. The issue occurred when an orchestration completed between checking its status and registering a waiter, resulting in missed completion notifications and 30-second timeouts.
Key Changes:
- Reordered operations in
WaitForInstanceAsyncto register the waiter before checking completion status, eliminating the race condition window - Added clear comments explaining the race condition fix and the two-step registration-then-check pattern
Co-authored-by: Copilot <[email protected]>
|
@copilot update pr description to follow .github/PULL_REQUEST_TEMPLATE.md |
RestartAsync_EndToEnd(restartWithNewInstanceId: True)was failing intermittently due to a TOCTOU race condition inWaitForInstanceAsync.Problem
The original code checked completion status before adding a waiter. If orchestration completed between these two steps, the completion notification was sent to no registered waiter, causing the caller to wait indefinitely (30s timeout).
Fix
Reorder operations to add waiter first, then check status:
This ensures we either:
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.